-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MERGE on cloned table in Delta Lake #24756
base: master
Are you sure you want to change the base?
Conversation
37ff93a
to
60e9e73
Compare
60e9e73
to
6432fab
Compare
9bc68fc
to
0cb664f
Compare
...sts/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeCloneTableCompatibility.java
Show resolved
Hide resolved
...sts/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeCloneTableCompatibility.java
Show resolved
Hide resolved
...sts/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeCloneTableCompatibility.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Show resolved
Hide resolved
...sts/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeCloneTableCompatibility.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
7ef5f5d
to
be52ef7
Compare
be52ef7
to
1b33203
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeBasic.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/resources/deltalake/clone_merge/clone_merge_cloned/README.md
Outdated
Show resolved
Hide resolved
b583ce9
to
e8d8d33
Compare
e8d8d33
to
c76bcf1
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Show resolved
Hide resolved
c76bcf1
to
5325f92
Compare
USING DELTA | ||
TBLPROPERTIES ('delta.enableDeletionVectors' = 'true'); | ||
|
||
INSERT INTO clone_merge_deletion_vector_source VALUES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This insert creates single rows in each parquet files, so following delete just removes entire files and there is no deletionVector in tansactionlog.
It should be something like:
{"add":{"path":"20250206_113723_00025_y7b6p_79b0624a-b032-4e9f-b67b-9c84b9960729","partitionValues":{},"size":511,"modificationTime":1738841844050,"dataChange":true,"stats":"{\"numRecords\":2,\"minValues\":{\"id\":2,\"v\":\"updated\",\"part\":\"2024-01-01\"},\"maxValues\":{\"id\":4,\"v\":\"updated\",\"part\":\"2024-02-02\"},\"nullCount\":{\"id\":0,\"v\":0,\"part\":0}}","tags":{},"deletionVector":{"storageType":"u","pathOrInlineDv":"-z*atcBlDyPB90fEl>c^","offset":1,"sizeInBytes":38,"cardinality":1}}}
Not sure how to enforce more values in one file thou. @ebyhr Do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you wanna see the cloned table read the source table DVs?
It's not support to read the 'p' type of DV now which in the cloned table.
Even we support read the 'p' type DV, still, need to change the path to relative path in DV, since we are loading the table from resource don't know the prefix of the absolute path. But that would need more extra logic to "allow read relative path as well" in the implementation of the support read absolute path DV.
Due to the constraints, seems now add the DV tests into product test is more natural to do, WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #24946 , when this finalized we can add this test by modifying the refered paths when loading tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cloned table will always have p
type I assume? If that's the case then it's also ok that cloned table with vectors would not be supported. So either #24946 can be treated as prerequisite, or leave this test in current state add product test that shows Trino failure in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far seems so. I prefer to update the test after p
type dv read is supported
Description
Fix problem that fail update on cloned table, reproduce steps:
testing/bin/ptl env up --environment singlenode-delta-lake-oss
In Trino:
create schema delta.tiny with (location='s3://test-bucket/tiny/');
In Spark-sql:
CREATE TABLE tiny.t1 (id int, v string, part date) USING DELTA PARTITIONED BY (part);
In Trino:
insert into delta.tiny.t1 values (1, 'A', TIMESTAMP '2024-01-01'), (2, 'B', TIMESTAMP '2024-01-01'), (3, 'C', TIMESTAMP '2024-02-02'), (4, 'D', TIMESTAMP '2024-02-02');
In Spark-sql:
CREATE TABLE tiny.t1clone SHALLOW CLONE tiny.t1;
In Trino:
update delta.tiny.t1clone set v = 'update1' where id in (1,3);
It fails with:Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: